30 research outputs found
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Loosely coupled programming is a powerful paradigm for rapidly creating
higher-level applications from scientific programs on petascale systems,
typically using scripting languages. This paradigm is a form of many-task
computing (MTC) which focuses on the passing of data between programs as
ordinary files rather than messages. While it has the significant benefits of
decoupling producer and consumer and allowing existing application programs to
be executed in parallel with no recoding, its typical implementation using
shared file systems places a high performance burden on the overall system and
on the user who will analyze and consume the downstream data. Previous efforts
have achieved great speedups with loosely coupled programs, but have done so
with careful manual tuning of all shared file system access. In this work, we
evaluate a prototype collective IO model for file-based MTC. The model enables
efficient and easy distribution of input data files to computing nodes and
gathering of output results from them. It eliminates the need for such manual
tuning and makes the programming of large-scale clusters using a loosely
coupled model easier. Our approach, inspired by in-memory approaches to
collective operations for parallel programming, builds on fast local file
systems to provide high-speed local file caches for parallel scripts, uses a
broadcast approach to handle distribution of common input data, and uses
efficient scatter/gather and caching techniques for input and output. We
describe the design of the prototype model, its implementation on the Blue
Gene/P supercomputer, and present preliminary measurements of its performance
on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200
Towards Loosely-Coupled Programming on Petascale Systems
We have extended the Falkon lightweight task execution framework to make
loosely coupled programming on petascale systems a practical and useful
programming model. This work studies and measures the performance factors
involved in applying this approach to enable the use of petascale systems by a
broader user community, and with greater ease. Our work enables the execution
of highly parallel computations composed of loosely coupled serial jobs with no
modifications to the respective applications. This approach allows a new-and
potentially far larger-class of applications to leverage petascale systems,
such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O
performance encountered in making this model practical, and show results using
both microbenchmarks and real applications from two domains: economic energy
modeling and molecular dynamics. Our benchmarks show that we can scale up to
160K processor-cores with high efficiency, and can achieve sustained execution
rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing,
Networking, Storage and Analysis (SuperComputing/SC) 200
Performance Analysis of a Parallel Discrete Model for the Simulation of Laser Dynamics
This paper presents an analysis on the performance of
a parallel implementation of a discrete model of laser dynamics,
which is based on cellular automata. The performance
of a 2D parallel version of the model is studied as
a rst step to test the feasibility of a parallel 3D version,
which is needed to simulate speci c laser systems. The 3D
version will have to run on a parallel computer due to its
runtime and memory requirements. The model has been implemented
on a Beowulf Cluster using the message passing
paradigm. The parallel implementation is found to exhibit
a good speedup, allowing us to run realistic simulations of
laser systems on clusters of workstations, which could not
be afforded on an individual machine due to the extensive
runtime and memory size needed.Ministerio de Educación y Ciencia TIC2002-04498-C05-0
Parallel implementation of a cellular automaton model for the simulation of laser dynamics
A parallel implementation for distributed-memory MIMD
systems of a 2D discrete model of laser dynamics based on cellular au-
tomata is presented. The model has been implemented on a PC cluster
using a message passing library. A good performance has been obtained,
allowing us to run realistic simulations of laser systems in clusters of
workstations, which could not be a orded on an individual machine due
to the extensive runtime and memory size needed.Ministerio de Educación y Ciencia TIN2005-08818-C04-0
Parallel Cellular Automata-based Simulation of Laser Dynamics using Dynamic Load Balancing
We present an analysis of the feasibility of executing a parallel bioinspired model of laser dynamics, based on cellular automata (CA), on the usual target platform of this kind of applications: a heterogeneous non-dedicated cluster. As this model employs a synchronous CA, using the single program, multiple data (SPMD) paradigm, it is not clear in advance if an appropriate efficiency can be obtained on this kind of platform. We have evaluated its performance including artificial load to simulate other tasks or jobs submitted by other users. A dynamic load balancing strategy with two main differences from most previous implementations of CA based models has been used. First, it is possible to migrate load to cluster nodes initially not belonging to the pool. Second, a modular approach is taken in which the model is executed on top of a dynamic load balancing tool – the Dynamite system – gaining flexibility. Very satisfactory results have been obtained, with performance increases from 60% to 80%.Ministerio de Ciencia e Innovación TIN2007-68083-C02Junta de Extremadura PRI06A22
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
CNN-based surrogates have become prevalent in scientific applications to
replace conventional time-consuming physical approaches. Although these
surrogates can yield satisfactory results with significantly lower computation
costs over small training datasets, our benchmarking results show that
data-loading overhead becomes the major performance bottleneck when training
surrogates with large datasets. In practice, surrogates are usually trained
with high-resolution scientific data, which can easily reach the terabyte
scale. Several state-of-the-art data loaders are proposed to improve the
loading throughput in general CNN training; however, they are sub-optimal when
applied to the surrogate training. In this work, we propose SOLAR, a surrogate
data loader, that can ultimately increase loading throughput during the
training. It leverages our three key observations during the benchmarking and
contains three novel designs. Specifically, SOLAR first generates a
pre-determined shuffled index list and accordingly optimizes the global access
order and the buffer eviction scheme to maximize the data reuse and the buffer
hit rate. It then proposes a tradeoff between lightweight computational
imbalance and heavyweight loading workload imbalance to speed up the overall
training. It finally optimizes its data access pattern with HDF5 to achieve a
better parallel I/O throughput. Our evaluation with three scientific surrogates
and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch
Data Loader and 3.52X speedup over state-of-the-art data loaders.Comment: 14 pages, 15 figures, 5 tables, submitted to VLDB '2
The inferior gluteal artery anatomy: a detailed analysis with implications for plastic and reconstructive surgery
Background: The inferior gluteal artery (IGA) is a large terminal branch of the anterior division of the internal iliac artery (ADIIA). There is a significant lack of data regarding the variable anatomy of the IGA. Materials and methods: A retrospective study was conducted to establish anatomical variations, their prevalence and morphometrical data on IGA and its branches. The results of 75 consecutive patients who underwent pelvic computed tomography angiography (CTA) were analyzed. Results: The origin variation of each IGA was deeply analyzed. Four origin variations have been observed. The most common Type O1 occurred in 86 of the studied cases (62.3%). The median IGA length was set to be 68.50 mm (LQ = 54.29 ; HQ = 86.06). The median distance from the origin of the ADIIA to the origin of the IGA was set to be 38.22 mm (LQ = 20.22; HQ = 55.97). The median origin diameter of the IGA was established at 4.69 mm (LQ = 4.13; HQ = 5.45). Conclusions: The present study thoroughly analyzed the complete anatomy of the IGA and the branches of the ADIIA. A novel classification system for the origin of the IGA was created, where the most prevalent origin was from the ADIIA (Type 1; 62.3%). Furthermore, the morphometric properties (such as the diameter and length) of the branches of the ADIIA were analyzed. This data may be incredibly useful for physicians performing operations in the pelvis, such as interventional intraarterial procedures or various gynecological surgeries